SPlit: An Optimal Method for Data Splitting

نویسندگان

چکیده

In this article, we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. is based on the of support points (SP), which was initially developed finding representative continuous distribution. We adapt SP subsampling from using sequential nearest neighbor algorithm. also extend deal with categorical variables so that can be applied both regression classification problems. The implementation real datasets shows substantial improvement in worst-case performance several modeling methods compared commonly used random procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimal Model for Medicine Preparation Using Data Mining

Introduction: Lack of financial resources and liquidity are the main problems of hospitals. Pharmacies are one of the sectors that affect the turnover of hospitals and due to lack of forecast for the use and supply of medicines, at the end of the year, encounter over-inventory, large volumes of expired medicines, and sometimes shortage of medicines. Therefore, medicine prediction using availabl...

متن کامل

Optimal Data Split Methodology for Model Validation

The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question – how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the ...

متن کامل

An Optimal Model for Medicine Preparation Using Data Mining

Introduction: Lack of financial resources and liquidity are the main problems of hospitals. Pharmacies are one of the sectors that affect the turnover of hospitals and due to lack of forecast for the use and supply of medicines, at the end of the year, encounter over-inventory, large volumes of expired medicines, and sometimes shortage of medicines. Therefore, medicine prediction using availabl...

متن کامل

On Optimal Data Split for Generalization Estimation and Model Selection

Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimato...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Technometrics

سال: 2021

ISSN: ['0040-1706', '1537-2723']

DOI: https://doi.org/10.1080/00401706.2021.1921037